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Abstract 


The combinatorial number of possible methylomes in biological time and space is astro- 
nomical. Consequently, the computational analysis of methylomes needs to cater for a va- 
riety of data, throughput and resolution. Here, we review recent advances in 2™4 
generation sequencing (2GS) with a focus on the different methods used for the analysis 
of MeDIP-seq data. The challenges and opportunities presented by the integration of 
methylation data with other genomic data types are discussed as is the potential impact 
of emerging 3" generation sequencing (3GS) based technologies on methylation analysis. 
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1. Introduction 


For many years it’s been widely known by scientists that, despite possessing the same DNA 
sequence, not all genes can be active in all cells within an organism all of the time. It is through 
the regulation of genes that we are able to see phenotypic differences between cells with 
identical genotypes. In the late 1930’s, Conrad Waddington introduced the term ‘epigenetic 
landscape’ to provide a metaphor for the cellular mechanisms leading to this regulation [1]. 
These regulatory, or epigenetic, patterns can be seen to persistently influence gene expression 
levels through cell division. Hence, epigenetics involves the study of marks and mechanisms 
that control gene expression in a mitotically and potentially meiotically heritable manner [2]. 


One such mechanism is DNA methylation (or more specifically cytosine methylation), an 
important epigenetic modification. DNA methylation, in conjunction with histone modifica- 
tions, remodeling complexes and non-coding RNAs, plays a vital role in regulating genome 
dynamics. In combination with these other modifications, DNA methylation can control the 
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accessibility of the underlying DNA to the transcriptional machinery through the modulation 
of chromatin density. As a result DNA methylation is involved in a diverse range of processes 
including embryogenesis, genomic imprinting, cellular differentiation, DNA protein interac- 
tions and gene regulation [3]. 


In mammalian genomes, DNA methylation occurs almost exclusively at palindromic CpG 
dinucleotides. CpG dinucleotides are found throughout the genome but are significantly 
depleted (21% of that expected in the human genome [4]) in comparison to other dinucleotide 
combinations. This is due to the hypermutability of methylated cytosines [5] where sponta- 
neous deamination to thymine occurs. However as a result of chance or potentially due to their 
functional importance, a minority of CpGs are maintained against this loss. 


The surviving CpGs are often found at a high density in localised genomic regions termed 
CpG islands (CGIs) [3]. Unlike the majority of CpGs, these regions, of approximately 1kb in 
length (though different algorithms produce different CGI predictions [6]), are largely 
unmethylated and have been found to overlap the promoter regions of 60-70% of all human 
genes, representing all constitutively expressed genes and approximately 40% of those 
displaying tissue specific expression patterns [7, 8]. Unmethylated CGIs are able to recruit CpG 
binding proteins such as Cfp1 [9], these in turn lead to the modification of histone tails [10] 
and the formation of permissive chromatin domains, potentially enabling the initiation of 
transcription [11]. In contrast, methylated CGIs are associated with gene silencing. This 
silencing can occur via various routes such as inhibiting the recruitment of DNA binding 
proteins from their target sites [12] or alternatively through the recruitment of methyl-CpG- 
binding domain (MBD) proteins that in turn recruit histone modifying complexes to the 
methylated sites [13]. 


Whilst methylation changes at CGIs is perhaps the most studied region, methylation occurs 
in other genomic locations as well. CpG island shores represent regions of lower CpG density 
flanking a CGI. They are generally defined as reaching 2kb upstream and downstream of an 
island. It has been found that most tissue specific methylation occurs in these shore regions 
rather than the islands [14, 15]. Additionally, high levels of DNA methylation can be found in 
repetitive genomic regions. Rather than directly regulating the transcriptional potential of a 
gene, this methylation is seen to prevent chromosomal instability [16-18]. 


Although DNA methylation is largely found in the CpG dinucleotide, it has also been reported 
in humans and mouse at CHG and CHH sites [19, 20]. In comparison with a methylated CpG 
site, methylated non-CpG sites display a much lower level of methylation within a cell 
population [21] and show lower conservation between cell lines [22]. The mechanisms and 
functionality of non-CpG methylation are currently unclear but the levels appear to decrease 
during differentiation whilst being restored in induced pluripotent stem cells. This potentially 
suggests a role in the origin and maintenance of the pluripotent state [19, 23, 24]. 


DNA methylation changes have been associated with numerous conditions. Many cancers 
have shown hypomethylation at repetitive sequences thus promoting chromosomal instabil- 
ity. Examples include the LINE repeat L1 in a range of tumours [25] and satellite repeats 
ALRa and SATR1 in peripheral nerve sheath tumours [26]. Hypomethylation at specific 
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promoters can lead to aberrant expression of oncogenes, whilst in contrast hypermethylation 


at specific island or shore sites can lead to transcriptional inactivation of genes involved in 


pathways such as DNA repair and apoptosis [2, 13]. Neurological disorders such as Alzheim- 


er's and Multiple sclerosis have been associated with aberrant DNA methylation as have 


autoimmune diseases such as ICF syndrome and rheumatoid arthritis [2]. 


2. Methods for the study of genome-wide DNA methylation 


Even within the relatively new field of second-generation (or next-generation) sequencing 
(2GS), a plethora of methods exist for the exploration of DNA methylation and the analysis of 


the ensuing data (Table 1). Such methods include the use of restriction endonucleases, or the 


bisulphite conversion of DNA. Here we discuss in detail the analysis of affinity enrichment 
techniques, specifically MeDIP-seq. For a full review of other methods see [27]. 


Software 


Batman 


Method 


MeDIP-seq 


Summary 


Bayesian tool for methylation analysis of MeDIP 


profiles 


Publication 


[33] 


Bismark 


Bisulphite 


Maps bisulfite treated sequencing reads through in- 
silico bisulfite conversion of both reads and 
genome. Performs methylation calling in a quick 


and easy-to-use fashion. 


[81] 


Bis-SNP 


Bisulphite 


Estimates methylation probabilities of different 
cytosine context to determine genotypes and 


methylation levels simultaneously. 


[61] 


BSMAP 


Bisulphite 


Mapping software for bisulphite sequencing. 
BSMAP aligns the Ts in the reads to both Cs and Ts 
in the reference by building a “seed” index of the 


reference genome. 


[82] 


BS-Seeker 


Bisulphite 


Accurate and fast mapping of bisulfite-treated short 
reads through in-silico bisulfite conversion of both 


reads and genome. 


[83] 


EpiExplorer 


Various 


Web tool that allows you to use large reference 
epigenome datasets for your own analysis without 


complex scripting or preprocessing. 


[58] 


Epigenome 


Browser 


Various 


Resource for visualizing and interacting with 
whole-genome datasets. The browser currently 
hosts Human Epigenome Atlas data produced by 


the Roadmap Epigenomics project. 


[84] 
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Software Method Summary Publication 
MEDIPS MeDIP-seq, Bioconductor package providing a comprehensive [38] 
approach for normalizing and analyzing MeDIP- 
seq data 
MeDUSA MeDIP-seq, Performs a full analysis of MeDIP-seq data, [40] 


including sequence alignment, QC and 


determination and annotation of DMRs 


MeQA MeDIP-seq, Pipeline for the pre-processing, quality assessment, [39] 
read distribution and methylation estimation for 


MeDIP-seq datasets 


MethMarker Validation Implements a systematic workflow for design, [85] 
optimization and (computational) validation of 


DNA methylation biomarkers. 


Methylcoder Bisulphite Software pipeline for bisulfite-treated sequences [86] 


MethylSeekR Bisulphite Accurately identifies the footprints of active [87] 


regulatory regions from bisulfite-sequencing data 


Metmap Methyl-seq Produces corrected site-specific methylation states [88] 
from MethylSeq experiments and annotates 


unmethylated islands across the genome. 


Sherman Validation Simulates ungapped high-throughput datasets for http://tinyurl.com/bwkttgh 
bisulfite sequencing. Allows for evaluation of the 
influence of common problems observed in many 


sequencing experiments. 


Table 1. Examples of software available for the analysis of 2GS methylation data. 


Buoyed by the success of combining chromatin immunoprecipitation with second generation 
sequencing for genome-wide studies of histone modifications and transcription factor binding 
sites [28] (termed ChIP-seq), similar techniques were adopted for methylation. These methods 
generally involve either enrichment through methylcytosine-specific protein domains (e.g. 
MethylCap[29], MBD-seq[30]) or through antibody-mediated immunoprecipitation (e.g. 
MeDIP[31], MRE-seq[32]) prior to sequencing[33, 34]. Such approaches, whilst not offering the 
resolution of bisulphite sequence data, are both genome-wide and increasingly affordable. 
Concordance in methylation calls between different enrichment and bisulphite methods have 
been shown to be high[35, 36]. In methylated DNA immunoprecipitation (MeDIP), an antibody 
capable of recognizing 5mC is utilized to immunoprecipitate the methylated fraction of the 
genome. One issue that has been highlighted with enrichment methods such as MeDIP, is the 
necessity to take the sequencing to saturation in order to confirm lack of methylation at a CpG 
site. Such a policy would be costly and would generate a vast amount of redundant data and 
as such saturation has not been reached with these methods. Methylation-sensitive restriction 
enzymes (MRE) target unmethylated CpGs for sequencing thus one alternative suggestion is 
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to integrate the MRE-seq method with MeDIP-seq. Such integration will have the benefit of 
reducing the need for saturation sequencing and will highlight regions of intermediate 
methylation, which would be difficult to detect using a single method. Going a step further, 
if coupled with single nucleotide polymorphism (SNP) profiling, it would also be possible to 
detect potential allele-specific epigenetic states[35]. 


MeDIP-seq is a popular enrichment technique for interrogating the methylation status of 
cytosines across entire genomes. It has been used in numerous studies including the first 
mammalian methylome [33] and the first cancer methylome [26]. In the next section, ap- 
proaches for the analysis of MeDIP-seq data will be discussed in greater detail. 


3. Computational approaches for the analysis of MeDIP-seq data 


A number of computational tools have been developed for the analysis of MeDIP data (Table 
1), including Batman [33], MEDME [37], MEDIPS [38], MeQA [39] and MeDUSA [40]. The 
method to use depends very much on the questions you want to ask of the data, and as a result 
the type of analysis performed can be described as analyzing absolute methylation or, 
alternatively, relative methylation. 


3.1. Absolute methylation 


The efficiency of immunoprecipitation in MeDIP is largely dependent on the density of 
methylated CpG sites. Therefore, it is difficult to distinguish true variation in enrichment, and 
hence methylation, from confounding effects caused by fluctuations in CpG density. This bias 
needs to be corrected for in order to perform accurate and biologically relevant comparisons 
of methylation states between different genomic regions. 


The first method to try and correct for this bias was called Batman (Bayesian Tool for Meth- 
ylation Analysis)[33]. This tool was originally written to analyse MeDIP-chip data, but can also 
be applied to 2GS. Batman, distributed as a suite of Java scripts, models the effect of varying 
densities of methylated CpGs on MeDIP enrichment, resulting in the transformation of the 
count of the aligned sequence read depth within a 100bp region into a quantitative measure 
of DNA methylation. Such data can then be used to compare global methylation scores 
between methylomes or between feature types (e.g. CpG islands, exons) within a methylome. 
Batman was used for the analysis of the first mammalian methylome[33] and also the first 
cancer methylome[26]. Unfortunately, Batman was disproportionately time consuming to run, 
even when running with multiple processors. The R BioConductor package[41] MEDIPS v1.8 
[38] attempted to utilize much of the methodology used in the Batman approach whilst 
outperforming the computation time by orders of magnitude. By implementing MEDIPS as 
an R package, this method is also more approachable for the majority of users, requiring less 
computational knowledge to run the methods. In addition to generating genome-wide 
methylation scores, MEDIPS sought to provide MeDIP-seq specific quality control metrics 
such as calculating the degree of enrichment of CpG-rich sequenced reads relative to genomic 
background. Finally, MEDIPS provided a methodology for determining the location of 
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differentially methylated regions (DMRs) between samples. Whilst MEDIPS, building on the 
strengths of Batman, undoubtedly provided an important step forward in the analysis of 
MeDIP-seq data, it also had significant issues that need to be considered both before use and 
when interrogating output from the program. For example, the DMR calling algorithm 
requires an input sample to be sequenced in addition to the immunoprecipitated sample, thus 
effectively doubling costs. 


3.2. Relative methylation 


Methods for calculating absolute methylation have proven to be useful when identifying large 
global changes, for example hypomethylation of satellite repeats in peripheral nerve sheath 
tumours[26]. Additionally, transforming MeDIP-seq data from read counts to a methylation 
score has assisted in validating experiments against bisulphite data[33]. However, as yet, these 
methods have not provided a framework for determining the location of DMRs ina statistically 
rigorous manner. To achieve this, relative changes in DNA methylation between cohorts can 
be determined, rather than absolute changes within a cohort. As such the problem has much 
in common with other sequencing protocols, such as identifying differential expression 
between RNA-segq cohorts or identifying peaks from a ChIP-seq sample. This commonality 
opens up an abundance of methods that can be used or adapted for MeDIP-seq sample analysis, 
for example peak finding using MACS[42, 43], or DMR finding using DESeq [44] or edgeR [45]. 


There are several hurdles to cross when analysing MeDIP-seq data, particularly during the 
identification of DMRs. Read counts need to be normalized to eliminate biases as a result of 
variability in sequencing depth between samples. Whilst global read count normalization can 
help address this problem, it does not account for ‘competition’ effects. RNA-seq provides an 
example of such effects, in which condition specific highly expressed genes can lead to a 
depressed read count in other genes and hence a bias when comparing samples[46]. An 
analogous situation can be found in MeDIP-seq, where sample-specific repeat methylation 
could potentially diminish reads in other genomic regions and introduce bias to analyses, 
particularly given the large amount of repetitive sequence methylated in the genome. Further, 
despite falling sequencing costs, MeDIP-seq experiments will often have few biological 
replicates. As a result, it can be difficult to obtain reliable estimates of model parameters to fit 
statistical models and thereby locate real differences between samples. By using methods such 
as DESeq that estimate variance in a local fashion, it is possible to remove potential selection 
biases [44]. Additionally, DESeq estimates a flexible, mean-dependent local regression rather 
than attempting to reliably estimate both the variance and mean parameters of the distribution 
from limited numbers of replicates. Typically, there is enough data available in these experi- 
ments to allow for sufficiently precise local estimation of the dispersion [44] and hence avoid 
bias towards certain areas of the dynamic range when identifying DMRs. Finally, accurate 
biological interpretation could be compromised by differences in DNA fragment size distri- 
butions between samples. Performing fragment length normalization through read sub- 
sampling to equalize the distributions can eliminate this potential bias. 


Additionally, the methods developed for absolute methylation calculation are unable to take 
account of non-CpG methylation and, due to the models used being based on local CpG 
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density, the presence of non-CpG methylation could adversely skew the output. In contrast, 
a relative methylation approach should be able to locate differences in methylation driven by 
asymmetric non-CpG methylation[47], taking advantage of the affinity of the MeDIP-seq 
antibody for methylated cytosine (rather than exclusively selecting for methylated CpGs). 


The first pipeline to provide a comprehensive methodology for analyzing MeDIP-seq data, 
with the focus on accurate and statistically rigorous identification of DMRs, was MeDUSA 
(Methylated DNA Utility for Sequence Analysis) (https://www.ucl.ac.uk/cancer/medical- 
genomics/medusaproject) [40]. MeDUSA (v1) utilized a number of software packages to 
perform a complete analysis of MeDIP-seq data. This included sequence alignment, quality 
control (QC), and determination and annotation of DMRs. The novel aspect of MeDUSA was 
the approach to DMR calling. It utilized the USeq suite of tools, specifically MultipleReplicaS- 
canSeqs (MRSS) and EnrichedRegionMaker [48]. MRSS formatted data for use in the BioCon- 
ductor package DESeq [44]. DESeq determined significant differential counts between cohorts. 
These regions are passed to EnrichedRegionMaker to determine if multiple regions can be 
combined to create single larger regions. MeDUSA proceeded to provide initial annotation of 
these DMR regions. 


More recently new versions of both MEDIPS (v1.10) and MeDUSA (v2) have been released. 
The MEDIPS package now incorporates methods from the edgeR [45] bioconductor package 
to provide a DMR calling methodology analogous to that used in MeDUSA. However, the 
approach and implementation employed by MEDIPS is more efficient (both time and compu- 
tational) than the DMR calling method used in MeDUSA v1. As a consequence, MeDUSA (v2) 
now utilises MEDIPS for the DMR calling stage of the pipeline. 


4, Data integration 


As more studies are published and sequencing costs fall, the opportunity to integrate meth- 
ylation datasets with other data types increases[49]. Whilst being able to detect changes in 
methylation is interesting, it is more interesting, and indeed more likely to be of functional 
importance, if this change associates with other detectable biological signals. For example, the 
potential of associating a methylation change with a corresponding change in transcription of 
a particular splice variant[50-52] from RNA-seq, or with an increase in binding of a specific 
transcription factor using ChIP-seq data[53]. 


In addition to the published sequence and array based datasets stored in public repositories 
such as GEO[54], a number of datasets are pre-loaded in public Genome Browsers. For 
example, the UCSC Genome Browser provides access to data from the ENCODE project[55], 
including expression data in the form of RNA-seq and regulatory data generated through 
ChIP-seq representing several different cell lines and various primary tissue types. Com- 
pressed file formats such as bigWig and bigBed[56] make it relatively simple to load and 
visualize multiple data types (Figure 1) whilst software such as bedTools[57] allow for quick 
intersections between data to be determined. EpiExplorer functions as a user-friendly web- 
based solution for providing initial annotations of feature sets [58], such as differentially 
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methylated regions. It enables exploratory analysis of user-uploaded data and provides links 
to many external public datasets. As datasets become larger and more complex, other methods 
of integration may be required, for example an unsupervised clustering approach may be 
useful [49, 59]. 
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Figure 1. Visualising MeDUSA output in UCSC Genome Browser. MeDIP tracks are shown for 3 embryonic stem cell 
(ESC) replicates and 3 Mouse embryonic fibroblasts (MEF) replicates over the Hoxc13 gene. The CpG island in the pro- 
moter region is hypomethylated in the ESC samples, suggesting more permissible chromatin in ESCs than in MEFs. 
This is supported by the ES-CJ7 DNase I Hypersensitivity track. Additionally the RNA-seq tracks show transcriptional 
differences in this gene between ESCs and MEFs. 


In addition to transcriptomic and regulatory data, it is also possible to integrate methylation 
data with genomic information. A perceived difference in methylation at a given CpG 
dinucleotide between samples could be caused by one sample possessing a methylated 
cytosine whilst the other sample possesses an unmethylated cytosine. Alternatively, the 
methylation difference could be due to the presence of a SNP, seeing the cytosine replaced 
with an alternative base. Therefore, the use of genotype profiling can clarify whether a 
methylation difference is a result of genetic or epigenetic changes. The need to consider both 
genetic and epigenetic changes came to the fore with the release of the Illumina Infinium 
HumanMethylation450 BeadChip. This chip allows for the interrogation of 485000 potential 
sites of methylation. However, a significant proportion of these sites are also sites of known 
SNPs[60]. Thus, any difference detected at these sites could be driven by epigenetic or genetic 
factors. Whilst this is an issue for the array analysis, tools such as Bis-SNP are able to make 
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SNP calls from bisulphite sequencing data, in doing so allowing for both accurate quantifica- 
tion of methylation levels and for identification of allele-specific epigenetic events such as 
imprinting [61]. 


A recent study utilised a combination of SNP, expression and methylation data to determine 
whether methylation has a passive or active role in gene regulation [62]. Three models were 
considered for the relationship between methylation and regulation. The first model described 
how a SNP would independently influence expression and methylation, for example through 
SNP modification of a transcription factor binding site (the impact on methylation of small 
changes to nucleotides constituting a TFBS have been explored in a recent tri-primate meth- 
ylome study [89]). In the second model, a SNP would impact upon methylation, which, in turn, 
would modify expression. The final model shows a SNP affecting expression that consequently 
alters the methylation state. It was found that, in reality, each of these models occurs in different 
contexts with the frequency of the model varying according to cell type [62, 63]. Such studies 
underline the complexity inherent in, and the difficulty in deciphering, regulatory interactions 
and should serve as a warning to those seeking overly simplistic interpretations [63]. 


Extending the genetic effect out from a single site to an entire region, it is possible that 
methylation levels could be strongly influenced by the haplotypic phase[64]. Haplotype 
specific methylation (HSM) is a result of the cumulative methylation effect driven by the phase 
of anumber of CpG-SNPs within the haplotype. This signal was strong enough to be identified 
across the 47kb FTO linkage disequilibrium block[65]. Such a finding is only possible through 
the integration of DNA methylation data and genome wide association study data. It is also 
worth remembering at this juncture that whether a measured methylation difference is due to 
a SNP or not, the downstream impact on the transcriptional potential of the chromosomal 
region in question could be the same. 


5. Future perspectives 


The field of epigenetics and specifically the study of DNA methylation have emerged as major 
areas of research in recent years. This rise can be largely attributed to the impact of emerging 
technologies, particularly 2GS. Projects that would have been perceived as impossible just a 
few years ago have been completed and more are underway. The International Human 
Epigenome Consortium (IHEC) (http://www.ihec-epigenomes.org/) was established to 
provide high-resolution reference epigenome maps to the research community by coordinat- 
ing large-scale international efforts. The grand aim of which is to generate 1000 reference 
epigenomes. Various initiatives worldwide have joined IHEC in an attempt to complete the 
goal. In Europe, the BLUEPRINT Project[66] will take the IHEC goal forward and in doing so 
improve our understanding of the human epigenome — of which the methylome is a key 
constituent. 


There are still many questions associated with the role of DNA methylation. Some with regards 
to the biology, and some the techniques used. It is important to know, for example, if using an 
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enrichment based technique, what the specificity of your antibody is. Different antibodies 
appear to show differing levels of repeat enrichment when performing MeDIP[67]. It would 
be of benefit to standardize these analyses. Similarly, different bisulphite conversion protocols 
may lead to differing conversion success. Global CpG methylation levels obtained from WGBS 
for 3 human embryonic stem cell (HESC) lines showed surprising variability (72% - 85%)[68]. 
This could be due to unstable gain and loss of methylation as previously reported in embryonic 
stem cells (ESCs)[69, 70], but it could also be a result of pre-analysis protocol and lab specific 
differences in sample preparation. Equally, it will be interesting to discover more about the 
biological roles and genomic location of the different cytosine modifications (5-hydroxyme- 
thylcytosine[47], 5-Formylcytosine and 5-Carboxylcytosine[71]) and also non-CpG methyla- 
tion. 


New technologies with the potential for adaption for the analysis of DNA methylation are 
being developed constantly. For example, improved methods of methylation validation would 
be highly beneficial. Often hundreds or thousands of potential candidate regions are generated 
from a multi-sample MeDIP-seq comparison, and similar numbers could be produced by 
future EWAS (Epigenome-Wide Association Studies)[72]. Ideally, many of these regions 
would be validated using a different technology. Targeted bisulphite sequencing is often used, 
however this can often be laborious and time-consuming. Combining new technologies such 
as microdroplet-based PCR target enrichment (e.g. RainDance Technologies) with 2GS has 
recently been developed into a high-throughput platform termed RainDropBS-seq [73], 
providing an excellent option to remove the validation bottle-neck. There is also the emergence 
of third generation sequencing on the horizon. Third generation sequencing (3GS) theoretically 
promises many advantages over existing 2GS methods including higher throughput, longer 
read lengths, improved accuracy and requiring smaller amounts of starting material[74], 
indeed some companies e.g. Oxford Nanopore Technologies, are promising single molecule 
sequencing[75, 76]. The potential of single molecule nanopore sequencing is particularly 
exciting for researchers working in the field of DNA methylation. Theoretically, it should be 
possible to sequence complex mammalian genomes and determine any base modifications 
such as methylation[77], potentially including hitherto undiscovered modifications, without 
the need for any of the treatments or enrichments discussed above. 


As the large-scale projects, such as IHEC, BLUEPRINT and increasingly clinically oriented 
projects such as OncoTrack progress, it is expected that many methods and tools will become 
standardized. This will be an important step in translating epigenomic knowledge from the 
bench to the clinic[78, 79]. In the future, it is hoped that a patient will be treated with drugs 
tailored to their particular condition — this is of particular relevance for cancer patients. 
Preliminary work using whole genome, exome and RNA-seq has demonstrated the potential 
for treating a real patient in a relatively short time period (24 days) and a relatively low cost 
(~$3600)[80]. Adding reliable epigenetic information, utilising the IHEC reference genomes, 
to this diagnostic toolbox is a logical next step. Extrapolating from these advances, it is quite 
clear that the bottleneck is shifting from logistics and data generation to computational 
analysis. 
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